Skip to content

Conversation

@zachdworkin
Copy link
Contributor

Attempting fi_av_lookup with a fi_addr larger than the max entry number of the AV will result in undefined behavior. It is expected by an application to not request lookup of an out of bounds fi_addr.

Attempting fi_av_lookup with a fi_addr larger than
the max entry number of the AV will result in undefined
behavior. It is expected by an application to not
request lookup of an out of bounds fi_addr.

Signed-off-by: Zach Dworkin <[email protected]>
@zachdworkin
Copy link
Contributor Author

DISCUSSION:
Behavior of fi_av_lookup is undefined for is an application attempts to lookup an address outside of the AV range.
Example: Create an AV table with 32 entries and lookup fi_addr 33.

Current behavior for this case is provider specific. Most use the util av code which does not check for this error and will segfault (if on non-debug build) in ofi_mem.h:ofi_bufpool_get_ibuf:501 when attempting to access a region in the table that is out of bounds. Providers like ucx specifically check for this case and return -FI_EINVAL if the requested fi_addr is out of bounds.

The next common error case is supported where 4 addresses are inserted into an AV with max 32 entries, fi_addr 3 is removed, and then fi_addr 3 is looked up. This is handled and will return -FI_ADDR_NOTAVAIL.

The out of bounds lookup case must either be defined as unsupported in the man page, or a common error such as -FI_EINVAL or -FI_ADDR_NOTAVAIL must be returned to let an application continue if the lookup "fails". I believe returning -FI_ADDR_NOTAVAIL makes the most sense for both of these error cases so the application can handle them both similarly.

No decision will be made until @j-xiong returns from vacation but please post thoughts here.

@j-xiong
Copy link
Contributor

j-xiong commented Nov 18, 2025

FI_ADDR_NOTAVAIL is not an error code. It is a special value for fi_addr_t.

@aingerson
Copy link
Contributor

I don't think there needs to be a man page distinction between "out of bounds" and not a valid fiaddr. I think they should both return the same error that indicates the fiaddr is not valid.
The application doesn't know whether something is "out of bounds" because the implementation is provider-specific. For example if an address gets removed, that index can be reused or not. That implementation is determined by the provider so the application doesn't know whether a value is within or outside of the bounds of the array.
Providers should be able to easily determine whether the value is within bounds or not for their own implementation so I don't think it's a huge ask.
I agree that the man page needs to be expanded to include the return the application should expect but I wouldn't say it should be left as "undefined." Providers can and should handle those in the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants